I claim: 



1. A computerized data mining method for automatically determining a prediction model for a 
dependent data mining variable based on at least one independent data mining variable, said 
method comprising the following steps: 

a variable replacement step (103) replacing said independent data mining variable with potential 
values from a global range by a multitude of independent local data mining variables, 
each independent local data mining variable with potential values from a subrange of said 
global range; 

an initialization step (104) initializing a current prediction model; 

a looping sequence (105-108) including a first step (106) having substeps of 

determining for every independent local data mining variable not yet reflected in said 
current prediction model a multitude of partial regression functions, each partial 
regression function depending only on one of said independent local data mining 
variables; 

determining for each of said partial regression functions a significance value; 
selecting the most significant partial regression function and the corresponding not 

yet reflected local data mining variable; and 
a second step (107) of adding said most significant partial regression function to said 
current prediction model and of associating said corresponding local data mining variable 
with said significance value. 

2. The method according to claim 1, wherein in said second step said most significant partial 
regression function is added only if its inclusion improves the adjusted correlation coefficient of 
the prediction model, and otherwise excluding said local data mining variable corresponding to 
said most significant partial regression function from said method. 
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3. The method according to claim 2, wherein in said second step said most significant partial 
regression function is added if its significance is above a threshold significance, and wherein said 
looping sequence includes a third step of determining if the significance of a certain partial 
regression function comprised within said current prediction model is reduced after execution of 
said second step and, in the affirmative case, removing said certain partial regression function 
with its corresponding local data mining variable from said current prediction model. 

4. The method according to claim 3, wherein said looping sequence terminates if all local data 
mining variables are reflected in said current prediction model. 

5. The method according to claim 3, wherein said looping sequence terminates if the significance 
of said most significant partial regression function is below a second threshold significance. 

6. The method according to claim 1, wherein in said initialization step said initialized current 
prediction model is empty. 

7. The method according to claim 1, wherein said partial regression functions are regression 
polynomials. 

8. The method according to claim 7, wherein said significance is determined by calculating the 
significance of all powers of a regression polynomial and using the minimum significance of said 
powers as significance measure of said regression polynomial. 

9. The method according to claim 8, wherein said calculating of said significance of said powers 
is based on F-test values for coefficients of said powers. 

10. The method according to claim 7, wherein said multitude of regression polynomials within 
said first step is determined by determining regression polynomials of all degrees up to a 
maximum degree M. 
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11. The method according to claim 1, wherein in said variable replacement step said global range 
is defined by its center defined by the mean value of training data used for the determination of 
the prediction model, and is defined by a lower and upper limit with a distance from said center 
being a predefined multiple of the standard deviation of said training data, and said subranges 
and said corresponding local data mining variables are defined as a fixed number H of subranges 
by dividing said global range into H equidistant subranges. 

12. The method according to claim 1 1, wherein in said subranges and said corresponding local 
data mining variables are of variable size defined by the following steps: 

a. an initial step of dividing said global range into maximum number H of equidistant subranges; 

b. an iteration step of selecting a certain subrange for which the number of said training data 
falling into said certain subrange is below a third threshold Np and joining said certain subrange 
with a neighbor subrange forming a larger subrange; and 

c. a termination step terminating said iteration step if for each subrange the number of said 
training data falling into said each subrange is equal to or above said third threshold. 

13. The method according to claim 11, wherein said local data mining variables are augmented 
by the following subranges and corresponding independent local data mining variables: 

a local data mining variable representing a subrange from -co up to said lower limit of 

said global range; and 
a local data mining variable representing a subrange from said upper limit of said 

global range up to +oo. 

14. A computer system comprising means adapted for carrying out the steps of the method 
according to claim 1. 

15. A computer program product comprising a machine-readable medium having 
computer-executable program instructions thereon including code means for causing a computer 
to perform a method according to claim 1 . 
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